1. Video
Coding
Video Compression
MIT 6.344, Spring 2004
John G. Apostolopoulos
Streaming Media Systems Group
Hewlett-Packard Laboratories
japos@hpl.hp.com
John G. Apostolopoulos
April 22, 2004 Page 1
2. Video
Coding
Overview of Next Three Lectures
Today • Video Compression (Thurs, 4/22)
– Principles and practice of video coding
– Basics behind MPEG compression algorithms
– Current image & video compression standards
• Video Communication & Video Streaming I (Tues, 4/27)
– Video application contexts & examples: DVD and Digital TV
– Challenges in video streaming over the Internet
– Techniques for overcoming these challenges
• Video Communication & Video Streaming II (Thurs, 4/29)
– Video over lossy packet networks and wireless links → Error-
resilient video communications
John G. Apostolopoulos
April 22, 2004 Page 2
3. Video
Coding
Outline of Today’s Lecture
• Motivation for compression
• Brief review of generic compression system (from prior lecture)
• Brief review of image compression (from last lecture)
• Video compression
– Exploit temporal dimension of video signal
– Motion-compensated prediction
– Generic (MPEG-type) video coder architecture
– Scalable video coding
• Overview of current video compression standards
– What do the standards specify?
– Frame-based video coding: MPEG-1/2/4, H.261/3/4
– Object-based video coding: MPEG-4
John G. Apostolopoulos
April 22, 2004 Page 3
4. Video Motivation for Compression:
Coding
Example of HDTV Video Signal
• Problem:
– Raw video contains an immense amount of data
– Communication and storage capabilities are limited
and expensive
• Example HDTV video signal:
– 720x1280 pixels/frame, progressive scanning at
60 frames/s:
⎛ 720 × 1280 pixels ⎞⎛ 60 frames ⎞⎛ 3colors ⎞⎛ 8bits ⎞
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ = 1.3Gb / s
⎝ frame ⎠⎝ sec ⎠⎝ pixel ⎠⎝ color ⎠
– 20 Mb/s HDTV channel bandwidth
→ Requires compression by a factor of 70 (equivalent
to .35 bits/pixel)
John G. Apostolopoulos
April 22, 2004 Page 4
5. Video
Coding
Achieving Compression
• Reduce redundancy and irrelevancy
• Sources of redundancy
– Temporal: Adjacent frames highly correlated
– Spatial: Nearby pixels are often correlated with
each other
– Color space: RGB components are correlated
among themselves
→ Relatively straightforward to exploit
• Irrelevancy
– Perceptually unimportant information
→ Difficult to model and exploit
John G. Apostolopoulos
April 22, 2004 Page 5
6. Video Spatial and Temporal Redundancy
Coding
• Why can video be compressed?
– Video contains much spatial and temporal redundancy.
• Spatial redundancy: Neighboring pixels are similar
• Temporal redundancy: Adjacent frames are similar
Compression is achieved by exploiting the spatial and temporal
redundancy inherent to video.
John G. Apostolopoulos
April 22, 2004 Page 6
7. Video
Coding
Outline of Today’s Lecture
• Motivation for compression
• Brief review of generic compression system (from prior lecture)
• Brief review of image compression (from last lecture)
• Video compression
– Exploit temporal dimension of video signal
– Motion-compensated prediction
– Generic (MPEG-type) video coder architecture
– Scalable video coding
• Overview of current video compression standards
– What do the standards specify?
– Frame-based video coding: MPEG-1/2/4, H.261/3/4
– Object-based video coding: MPEG-4
John G. Apostolopoulos
April 22, 2004 Page 7
8. Video
Coding
Generic Compression System
Original Compressed
Signal Representation Binary Bitstream
Quantization
(Analysis) Encoding
A compression system is composed of three key building blocks:
• Representation
– Concentrates important information into a few parameters
• Quantization
– Discretizes parameters
• Binary encoding
– Exploits non-uniform statistics of quantized parameters
– Creates bitstream for transmission
John G. Apostolopoulos
April 22, 2004 Page 8
9. Video
Coding
Generic Compression System (cont.)
Original Compressed
Signal Representation Binary Bitstream
Quantization
(Analysis) Encoding
Generally Lossy Lossless
lossless
• Generally, the only operation that is lossy is the
quantization stage
• The fact that all the loss (distortion) is localized to a
single operation greatly simplifies system design
• Can design loss to exploit human visual system (HVS)
properties
John G. Apostolopoulos
April 22, 2004 Page 9
10. Video
Coding
Generic Compression System (cont.)
Original Compressed
Signal Bitstream
Representation Binary
Quantization
(Analysis) Encoding
Source Encoder Channel
Reconstructed
Signal
Representation Inverse Binary
(Synthesis) Quantization Decoding
Source Decoder
• Source decoder performs the inverse of each of the three
operations
John G. Apostolopoulos
April 22, 2004 Page 10
11. Video
Coding
Review of Image Compression
Original Compressed
Image RGB Runlength & Bitstream
to Block DCT Quantization Huffman
YUV Coding
• Coding an image (single frame):
– RGB to YUV color-space conversion
– Partition image into 8x8-pixel blocks
– 2-D DCT of each block
– Quantize each DCT coefficient
– Runlength and Huffman code the nonzero quantized DCT
coefficients
→ Basis for the JPEG Image Compression Standard
→ JPEG-2000 uses wavelet transform and arithmetic coding
John G. Apostolopoulos
April 22, 2004 Page 11
12. Video
Coding
Outline of Today’s Lecture
• Motivation for compression
• Brief review of generic compression system (from prior lecture)
• Brief review of image compression (from last lecture)
• Video compression
– Exploit temporal dimension of video signal
– Motion-compensated prediction
– Generic (MPEG-type) video coder architecture
– Scalable video coding
• Overview of current video compression standards
– What do the standards specify?
– Frame-based video coding: MPEG-1/2/4, H.261/3/4
– Object-based video coding: MPEG-4
John G. Apostolopoulos
April 22, 2004 Page 12
13. Video
Coding
Video Compression
• Video: Sequence of frames (images) that are related
• Related along the temporal dimension
– Therefore temporal redundancy exists
• Main addition over image compression
– Temporal redundancy
→ Video coder must exploit the temporal redundancy
John G. Apostolopoulos
April 22, 2004 Page 13
14. Video
Coding
Temporal Processing
• Usually high frame rate: Significant temporal redundancy
• Possible representations along temporal dimension:
– Transform/subband methods
– Good for textbook case of constant velocity uniform
global motion
– Inefficient for nonuniform motion, I.e. real-world motion
– Requires large number of frame stores
– Leads to delay (Memory cost may also be an issue)
– Predictive methods
– Good performance using only 2 frame stores
– However, simple frame differencing in not enough…
John G. Apostolopoulos
April 22, 2004 Page 14
15. Video Video Compression
Coding
• Goal: Exploit the temporal redundancy
• Predict current frame based on previously coded frames
• Three types of coded frames:
– I-frame: Intra-coded frame, coded independently of all
other frames
– P-frame: Predictively coded frame, coded based on
previously coded frame
– B-frame: Bi-directionally predicted frame, coded based
on both previous and future coded frames
I frame P-frame B-frame
John G. Apostolopoulos
April 22, 2004 Page 15
16. Video Temporal Processing:
Coding
Motion-Compensated Prediction
• Simple frame differencing fails when there is motion
• Must account for motion
→ Motion-compensated (MC) prediction
• MC-prediction generally provides significant improvements
• Questions:
– How can we estimate motion?
– How can we form MC-prediction?
John G. Apostolopoulos
April 22, 2004 Page 16
17. Video Temporal Processing:
Coding
Motion Estimation
• Ideal situation:
– Partition video into moving objects
– Describe object motion
→ Generally very difficult
• Practical approach: Block-Matching Motion Estimation
– Partition each frame into blocks, e.g. 16x16 pixels
– Describe motion of each block
→ No object identification required
→ Good, robust performance
John G. Apostolopoulos
April 22, 2004 Page 17
18. Video Block-Matching Motion Estimation
Coding
4
3 4 3
2 2
1 7 8 1 8
7
6 6
5 12 5 12
11 11
10 10
Motion Vector 9 16 9 16
14 15 15
(mv1, mv2) 13 13
14
Reference Frame Current Frame
• Assumptions:
– Translational motion within block:
f (n1 , n2 , kcur ) = f (n1 − mv1 , n2 − mv2 , k ref )
– All pixels within each block have the same motion
• ME Algorithm:
1) Divide current frame into non-overlapping N1xN2 blocks
2) For each block, find the best matching block in reference frame
• MC-Prediction Algorithm:
– Use best matching blocks of reference frame as prediction of
blocks in current frame
John G. Apostolopoulos
April 22, 2004 Page 18
19. Video Block Matching:
Coding
Determining the Best Matching Block
• For each block in the current frame search for best matching
block in the reference frame
– Metrics for determining “best match”:
MSE = ∑ ∑ [ f (n1, n2 , kcur ) − f (n1 − mv1, n2 − mv2 , kref )]2
( n1 ,n2 )∈Block
MAE = ∑ ∑ f (n1, n2 , kcur ) − f (n1 − mv1, n2 − mv2 , kref )
( n1 ,n2 )∈Block
– Candidate blocks: All blocks in, e.g., (± 32,±32) pixel area
– Strategies for searching candidate blocks for best match
– Full search: Examine all candidate blocks
– Partial (fast) search: Examine a carefully selected subset
• Estimate of motion for best matching block: “motion vector”
John G. Apostolopoulos
April 22, 2004 Page 19
20. Video
Coding
Motion Vectors and Motion Vector Field
• Motion vector
– Expresses the relative horizontal and vertical offsets
(mv1,mv2), or motion, of a given block from one
frame to another
– Each block has its own motion vector
• Motion vector field
– Collection of motion vectors for all the blocks in a
frame
John G. Apostolopoulos
April 22, 2004 Page 20
21. Video Example of Fast Motion Estimation Search:
Coding
3-Step (Log) Search
• Goal: Reduce number of search
points
• Example: (± 7,±7 ) search area
• Dots represent search points
• Search performed in 3 steps
(coarse-to-fine):
Step 1: (± 4 pixels )
Step 2: (± 2 pixels )
Step 3: (± 1 pixels )
• Best match is found at each step
• Next step: Search is centered
around the best match of prior step
• Speedup increases for larger
search areas
John G. Apostolopoulos
April 22, 2004 Page 21
22. Video Motion Vector Precision?
Coding
• Motivation:
– Motion is not limited to integer-pixel offsets
– However, video only known at discrete pixel locations
– To estimate sub-pixel motion, frames must be spatially
interpolated
• Fractional MVs are used to represent the sub-pixel motion
• Improved performance (extra complexity is worthwhile)
• Half-pixel ME used in most standards: MPEG-1/2/4
• Why are half-pixel motion vectors better?
– Can capture half-pixel motion
– Averaging effect (from spatial interpolation) reduces
prediction error → Improved prediction
– For noisy sequences, averaging effect reduces noise →
Improved compression
John G. Apostolopoulos
April 22, 2004 Page 22
23. Video Practical Half-Pixel Motion Estimation
Coding
Algorithm
• Half-pixel ME (coarse-fine) algorithm:
1) Coarse step: Perform integer motion estimation on blocks; find
best integer-pixel MV
2) Fine step: Refine estimate to find best half-pixel MV
a) Spatially interpolate the selected region in reference frame
b) Compare current block to interpolated reference frame
block
c) Choose the integer or half-pixel offset that provides best
match
• Typically, bilinear interpolation is used for spatial interpolation
John G. Apostolopoulos
April 22, 2004 Page 23
24. Video Example: MC-Prediction for Two
Coding
Consecutive Frames
Previous Frame Current Frame
(Reference Frame) (To be Predicted)
4
3 4 3
2 2
1 7 8 1 8
7
6 6
5 12 5 12
11 11
10 16 10
9 15 9 16
14 15
14
13 13
Reference Frame Predicted Frame John G. Apostolopoulos
April 22, 2004 Page 24
25. Video Example: MC-Prediction for Two
Coding
Consecutive Frames (cont.)
Prediction of
Current Frame
Prediction Error
(Residual)
John G. Apostolopoulos
April 22, 2004 Page 25
26. Video
Coding
Block Matching Algorithm: Summary
• Issues:
– Block size?
– Search range?
– Motion vector accuracy?
• Motion typically estimated only from luminance
• Advantages:
– Good, robust performance for compression
– Resulting motion vector field is easy to represent (one MV
per block) and useful for compression
– Simple, periodic structure, easy VLSI implementations
• Disadvantages:
– Assumes translational motion model → Breaks down for
more complex motion
– Often produces blocking artifacts (OK for coding with
Block DCT)
John G. Apostolopoulos
April 22, 2004 Page 26
27. Video Bi-Directional MC-Prediction
Coding
4 4
3 4 3
2 2 3
1 7 8 1 8 1 2 7 8
7
6 6 6
5 12 5 12 5 11 12
11 11
10 16 10 10
9 15 9 16 9 15 16
14 15 14
14 13
13 13
Previous Frame Current Frame Future Frame
• Bi-Directional MC-Prediction is used to estimate a block in the
current frame from a block in:
1) Previous frame
2) Future frame
3) Average of a block from the previous frame and a block
from the future frame
4) Neither, i.e. code current block without prediction
John G. Apostolopoulos
April 22, 2004 Page 27
28. Video MC-Prediction and Bi-Directional
Coding
MC-Prediction (P- and B-frames)
• Motion compensated prediction: Predict the current frame
based on reference frame(s) while compensating for the motion
• Examples of block-based motion-compensated prediction
(P-frame) and bi-directional prediction (B-frame):
4 4 4
3 4 3 3 4 3
2 2 2 2 3
1 7 8 1 8 1 7 8 1 8 1 2 7 8
7 7
6 6 6 6 6
12 5 12 5 12 5 12 5 11 12
5 11 11 11 11
10 10 10 16 10 10
16 9
9 15 9 16 9
14 15 9
15
16 15 16
14 15 14
14 14 13
13 13 13 13
Previous Frame P-Frame Previous Frame B-Frame Future Frame
John G. Apostolopoulos
April 22, 2004 Page 28
29. Video Video Compression
Coding
• Main addition over image compression:
– Exploit the temporal redundancy
• Predict current frame based on previously coded frames
• Three types of coded frames:
– I-frame: Intra-coded frame, coded independently of all
other frames
– P-frame: Predictively coded frame, coded based on
previously coded frame
– B-frame: Bi-directionally predicted frame, coded based
on both previous and future coded frames
I frame P-frame B-frame
John G. Apostolopoulos
April 22, 2004 Page 29
30. Video Example Use of I-,P-,B-frames:
Coding
MPEG Group of Pictures (GOP)
• Arrows show prediction dependencies between frames
I0 B1 B2 P3 B4 B5 P6 B7 B8 I9
MPEG GOP
John G. Apostolopoulos
April 22, 2004 Page 30
31. Video
Coding
Summary of Temporal Processing
• Use MC-prediction (P and B frames) to reduce temporal
redundancy
• MC-prediction usually performs well; In compression have a
second chance to recover when it performs badly
• MC-prediction yields:
– Motion vectors
– MC-prediction error or residual → Code error with
conventional image coder
• Sometimes MC-prediction may perform badly
– Examples: Complex motion, new imagery (occlusions)
– Approach:
1. Identify frame or individual blocks where prediction fails
2. Code without prediction
John G. Apostolopoulos
April 22, 2004 Page 31
32. Video
Coding
Basic Video Compression Architecture
• Exploiting the redundancies:
– Temporal: MC-prediction (P and B frames)
– Spatial: Block DCT
– Color: Color space conversion
• Scalar quantization of DCT coefficients
• Zigzag scanning, runlength and Huffman coding of the
nonzero quantized DCT coefficients
John G. Apostolopoulos
April 22, 2004 Page 32
33. Video Example Video Encoder
Coding
Input Buffer fullness
Video Residual
Signal RGB
Huffman
to DCT Quantize Buffer
Coding
YUV Output
Bitstream
Inverse
Quantize MV data
Inverse
DCT
MC-Prediction
Motion Frame Store
Compensation
Previous
MV data Reconstructed
Frame
Motion
Estimation
John G. Apostolopoulos
April 22, 2004 Page 33
34. Video
Coding
Example Video Decoder
Reconstructed
Residual Frame
Huffman Inverse Inverse
Buffer YUV to RGB
Decoder Quantize DCT
Input Output
Bitstream Video
Signal
MC-Prediction Frame Store
Previous
MV data Motion Reconstructed
Compensation Frame
John G. Apostolopoulos
April 22, 2004 Page 34
35. Video
Coding
Outline of Today’s Lecture
• Motivation for compression
• Brief review of generic compression system (from prior lecture)
• Brief review of image compression (from last lecture)
• Video compression
– Exploit temporal dimension of video signal
– Motion-compensated prediction
– Generic (MPEG-type) video coder architecture
– Scalable video coding
• Overview of current video compression standards
– What do the standards specify?
– Frame-based video coding: MPEG-1/2/4, H.261/3/4
– Object-based video coding: MPEG-4
John G. Apostolopoulos
April 22, 2004 Page 35
36. Video Motivation for Scalable Coding
Coding
Basic situation:
1. Diverse receivers may request the same video
– Different bandwidths, spatial resolutions, frame rates,
computational capabilities
2. Heterogeneous networks and a priori unknown network conditions
– Wired and wireless links, time-varying bandwidths
→ When you originally code the video you don’t know which client
or network situation will exist in the future
→ Probably have multiple different situations, each requiring a
different compressed bitstream
→ Need a different compressed video matched to each situation
• Possible solutions:
1. Compress & store MANY different versions of the same video
2. Real-time transcoding (e.g. decode/re-encode)
3. Scalable coding
John G. Apostolopoulos
April 22, 2004 Page 36
37. Video
Coding
Scalable Video Coding
• Scalable coding:
– Decompose video into multiple layers of prioritized
importance
– Code layers into base and enhancement bitstreams
– Progressively combine one or more bitstreams to produce
different levels of video quality
• Example of scalable coding with base and two enhancement
layers: Can produce three different qualities
1. Base layer
2. Base + Enh1 layers Higher quality
3. Base + Enh1 + Enh2 layers
• Scalability with respect to: Spatial or temporal resolution, bit
rate, computation, memory
John G. Apostolopoulos
April 22, 2004 Page 37
38. Video
Coding
Example of Scalable Coding
• Encode image/video into three layers:
Base Enh1 Enh2
Encoder
• Low-bandwidth receiver: Send only Base layer
Base
Decoder Low Res
• Medium-bandwidth receiver: Send Base & Enh1 layers
Base Enh1
Decoder Med Res
• High-bandwidth receiver: Send all three layers
Base Enh1 Enh2
Decoder High Res
• Can adapt to different clients and network situations John G. Apostolopoulos
April 22, 2004 Page 38
39. Video
Coding
Scalable Video Coding (cont.)
• Three basic types of scalability (refine video quality
along three different dimensions):
– Temporal scalability → Temporal resolution
– Spatial scalability → Spatial resolution
– SNR (quality) scalability → Amplitude resolution
• Each type of scalable coding provides scalability of one
dimension of the video signal
– Can combine multiple types of scalability to provide
scalability along multiple dimensions
John G. Apostolopoulos
April 22, 2004 Page 39
40. Video
Coding
Scalable Coding: Temporal Scalability
• Temporal scalability: Based on the use of B-frames to
refine the temporal resolution
– B-frames are dependent on other frames
– However, no other frame depends on a B-frame
– Each B-frame may be discarded without affecting
other frames
I0 B1 B2 P3 B4 B5 P6 B7 B8 I9
MPEG GOP John G. Apostolopoulos
April 22, 2004 Page 40
41. Video
Coding
Scalable Coding: Spatial Scalability
• Spatial scalability: Based on refining the spatial resolution
– Base layer is low resolution version of video
– Enh1 contains coded difference between upsampled
base layer and original video
– Also called: Pyramid coding
Enh layer
Enc Dec
↓2 ↑2 ↑2 High-Res
Original Video
Video Dec
Base layer Low-Res
Enc Dec Video
John G. Apostolopoulos
April 22, 2004 Page 41
42. Video Scalable Coding: SNR (Quality)
Coding
Scalability
• SNR (Quality) Scalability: Based on refining the
amplitude resolution
– Base layer uses a coarse quantizer
– Enh1 applies a finer quantizer to the difference
between the original DCT coefficients and the
coarsely quantized base layer coefficients
EP frame
EI frame
Note: Base & enhancement
layers are at the same spatial
I frame P-frame resolution
John G. Apostolopoulos
April 22, 2004 Page 42
43. Video
Coding
Summary of Scalable Video Coding
• Three basic types of scalable video coding:
– Temporal scalability
– Spatial scalability
– SNR (quality) scalability
• Scalable coding produces different layers with prioritized
importance
• Prioritized importance is key for a variety of applications:
– Adapting to different bandwidths, or client resources
such as spatial or temporal resolution or computational
power
– Facilitates error-resilience by explicitly identifying most
important and less important bits
John G. Apostolopoulos
April 22, 2004 Page 43
44. Video
Coding
Outline of Today’s Lecture
• Motivation for compression
• Brief review of generic compression system (from prior lecture)
• Brief review of image compression (from last lecture)
• Video compression
– Exploit temporal dimension of video signal
– Motion-compensated prediction
– Generic (MPEG-type) video coder architecture
– Scalable video coding
• Overview of current video compression standards
– What do the standards specify?
– Frame-based video coding: MPEG-1/2/4, H.261/3/4
– Object-based video coding: MPEG-4
John G. Apostolopoulos
April 22, 2004 Page 44
45. Video
Coding
Motivation for Standards
• Goal of standards:
– Ensuring interoperability: Enabling communication
between devices made by different manufacturers
– Promoting a technology or industry
– Reducing costs
John G. Apostolopoulos
April 22, 2004 Page 45
46. Video
Coding
What do the Standards Specify?
Encoder Bitstream Decoder
John G. Apostolopoulos
April 22, 2004 Page 46
47. Video
Coding
What do the Standards Specify?
Encoder Bitstream Decoder
(Decoding
Process)
• Not the encoder Scope of Standardization
• Not the decoder
• Just the bitstream syntax and the decoding process (e.g. use IDCT,
but not how to implement the IDCT)
→ Enables improved encoding & decoding strategies to be
employed in a standard-compatible manner
John G. Apostolopoulos
April 22, 2004 Page 47
48. Video Current Image and Video
Coding
Compression Standards
Standard Application Bit Rate
JPEG Continuous-tone still-image Variable
compression
H.261 Video telephony and p x 64 kb/s
teleconferencing over ISDN
MPEG-1 Video on digital storage media 1.5 Mb/s
(CD-ROM)
MPEG-2 Digital Television 2-20 Mb/s
H.263 Video telephony over PSTN 33.6-? kb/s
MPEG-4 Object-based coding, synthetic Variable
content, interactivity
JPEG-2000 Improved still image compression Variable
H.264 / Improved video compression 10’s to 100’s kb/s
MPEG-4 AVC
John G. Apostolopoulos
April 22, 2004 Page 48
49. Video Comparing Current Video Compression
Coding
Standards
• Based on the same fundamental building blocks
– Motion-compensated prediction (I, P, and B frames)
– 2-D Discrete Cosine Transform (DCT)
– Color space conversion
– Scalar quantization, runlengths, Huffman coding
• Additional tools added for different applications:
– Progressive or interlaced video
– Improved compression, error resilience, scalability, etc.
• MPEG-1/2/4, H.261/3/4: Frame-based coding
• MPEG-4: Object-based coding and Synthetic video
John G. Apostolopoulos
April 22, 2004 Page 49
50. Video MPEG Group of Pictures (GOP)
Coding
Structure
• Composed of I, P, and B frames
• Arrows show prediction dependencies
• Periodic I-frames enable random access into the coded bitstream
• Parameters: (1) Spacing between I frames, (2) number of B frames
between I and P frames
I0 B1 B2 P3 B4 B5 P6 B7 B8 I9
MPEG GOP John G. Apostolopoulos
April 22, 2004 Page 50
51. Video
Coding
MPEG Structure
• MPEG codes video in a hierarchy of layers. The
sequence layer is not shown.
GOP Layer Picture Layer
P
B
B
P
B
B 4 8x8 DCT
I 1 MV 8x8 DCT
Block
Macroblock Layer
Slice Layer
Layer
John G. Apostolopoulos
April 22, 2004 Page 51
52. Video
Coding
MPEG-2 Profiles and Levels
• Goal: To enable more efficient implementations for
different applications (interoperability points)
– Profile: Subset of the tools applicable for a family of
applications
– Level: Bounds on the complexity for any profile
Level
HDTV: Main Profile at
High High Level (MP@HL)
Main DVD & SD Digital TV:
Main Profile at Main Level
Low (MP@ML)
Profile
Simple Main High
John G. Apostolopoulos
April 22, 2004 Page 52
53. Video
Coding
MPEG-4 Natural Video Coding
• Extension of MPEG-1/2-type algorithms to code
arbitrarily shaped objects
Frame-based Coding
Object-based Coding [MPEG Committee]
Basic Idea: Extend Block-DCT and Block-ME/MC-prediction
to code arbitrarily shaped objects
John G. Apostolopoulos
April 22, 2004 Page 53
54. Video
Coding
Example of
MPEG-4
Scene
(Object-based
Coding)
[MPEG Committee] John G. Apostolopoulos
April 22, 2004 Page 54
55. Video Example MPEG-4 Object Decoding Process
Coding
[MPEG Committee]
John G. Apostolopoulos
April 22, 2004 Page 55
56. Video
Coding
Sprite Coding (Background Prediction)
• Sprite: Large background image
– Hypothesis: Same background exists for many frames,
changes resulting from camera motion and occlusions
• One possible coding strategy:
1. Code & transmit entire sprite once
2. Only transmit camera motion parameters for each
subsequent frame
• Significant coding gain for some scenes
John G. Apostolopoulos
April 22, 2004 Page 56
57. Video
Coding
Sprite Coding Example
Sprite (background) Foreground
Object
Reconstructed
Frame [MPEG Committee]
John G. Apostolopoulos
April 22, 2004 Page 57
58. Video
Coding
Review of Today’s Lecture
• Motivation for compression
• Brief review of generic compression system (from prior lecture)
• Brief review of image compression (from last lecture)
• Video compression
– Exploit temporal dimension of video signal
– Motion-compensated prediction
– Generic (MPEG-type) video coder architecture
– Scalable video coding
• Overview of current video compression standards
– What do the standards specify?
– Frame-based video coding: MPEG-1/2/4, H.261/3/4
– Object-based video coding: MPEG-4
John G. Apostolopoulos
April 22, 2004 Page 58
59. Video
Coding
References and Further Reading
General Video Compression References:
• J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'',
Wiley Encyclopedia of Electrical and Electronics Engineering, John
Wiley & Sons, Inc., New York, 1999.
• V. Bhaskaran and K. Konstantinides, Image and Video Compression
Standards: Algorithms and Architectures, Boston, Massachusetts:
Kluwer Academic Publishers, 1997.
• J.L. Mitchell, W.B. Pennebaker, C.E. Fogg, and D.J. LeGall, MPEG
Video Compression Standard, New York: Chapman & Hall, 1997.
• B.G. Haskell, A. Puri, A.N. Netravali, Digital Video: An Introduction to
MPEG-2, Kluwer Academic Publishers, Boston, 1997.
MPEG web site:
http://drogo.cselt.stet.it/mpeg
John G. Apostolopoulos
April 22, 2004 Page 59